Matching of structural motifs using hashing on residue labels and geometric filtering for protein function prediction.
نویسندگان
چکیده
There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Our focus is on methods that determine binding site similarity. Although several such methods exist, it still remains a challenging problem to quickly find all functionally-related matches for structural motifs in large data sets with high specificity. In this context, a structural motif is a set of 3D points annotated with physicochemical information that characterize a molecular function. We propose a new method called LabelHash that creates hash tables of n-tuples of residues for a set of targets. Using these hash tables, we can quickly look up partial matches to a motif and expand those matches to complete matches. We show that by applying only very mild geometric constraints we can find statistically significant matches with extremely high specificity in very large data sets and for very general structural motifs. We demonstrate that our method requires a reasonable amount of storage when employing a simple geometric filter and further improves on the specificity of our previous work while maintaining very high sensitivity. Our algorithm is evaluated on 20 homolog classes and a non-redundant version of the Protein Data Bank as our background data set. We use cluster analysis to analyze why certain classes of homologs are more difficult to classify than others. The LabelHash algorithm is implemented on a web server at http://kavrakilab.org/labelhash/.
منابع مشابه
Matching of Proteins
Most biological actions of proteins depend on some typical parts of their three-dimensional structure, called 3D motifs. To automatically discover corresponding 3D motifs between proteins, we propose a new 3D substructure matching algorithm based on geometric hashing techniques. The key feature of the method is the introduction of a 3D reference frame attached to each amino acid. This allows to...
متن کاملLabelHash: A Flexible and Extensible Method for Matching Structural Motifs
There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Below, we will briefly describe LabelHash, a new method for ...
متن کاملFast Bayesian Shape Matching Using Geometric Algorithms
We present a Bayesian approach to comparison of geometric shapes with applications to classification of the molecular structures of proteins. Our approach involves the use of distributions defined on transformation invariant shape spaces and the specification of prior distributions on bipartite matchings. Here we emphasize the computational aspects of posterior inference arising from such model...
متن کاملA geometric algorithm to find small but highly similar 3D substructures in proteins
MOTIVATION Most biological actions of proteins depend on some typical parts of their three-dimensional structure, called 3D motifs. It is desirable to find automatically common geometric substructures between proteins to discover similarities in new structures or to model precisely a particular motif. Most algorithms for structural comparison of proteins deal with large (fold) similarities. Her...
متن کاملIdentification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational systems bioinformatics. Computational Systems Bioinformatics Conference
دوره 7 شماره
صفحات -
تاریخ انتشار 2008